Multilevel/Hierarchical Models

Setup

For this tutorial, you will need to install the lme4 package. This package contains all the functions we will need to begin fitting multilevel models in R. We also use the DiagrammeR package for making diagrams.

# Install the lme4 package

install.packages('lme4')
install.packages("DiagrammeR")

Introduction

So far in class we have covered a class of models that put a rather strict assumption on the structure of the data. That is, that the observations are independent of one another. In this tutorial and the one that follows we will cover a set of regression models where the data are structured in groups and coefficients can vary by group! In my own personal work, I have found many uses for these models as data that is hierarchicaclly structured arises naturally.

First some terminology. You will find these types of models referecenced under various terms, often used interchangeably. Here I list a few:

  • Multilevel Models
  • Hierarchical Models
  • Mixed Effects models
  • Random Effects models

We will start simply by giving some examples of the types of data that we will model with hierarchical models. We will then look at some random intercept models followed by some random slope models. All data can be found in the data/ folder of this github repository.

Example data

Multilevel models are used when we want to model how relationships vary across units (levels). Data from multilevel models are often (but not always) would as coming from a nested structure. For example”

  • Students within classes within schools
  • Students within project groups within a class
  • Residents within apartment buildings within cities
  • Patients treated by doctors
  • Patients within treatment groups measured over time

When presented with this type of data, it is useful to think about the stucture and map out the different levels.

For example: We want to model student math scores from students in 25 different scools from 5 cities. We would map out these levels as:

  • Level 1: the student
  • Level 2: the classroom
  • Level 3: the school
  • Level 4: the city

We do this because we may believe that any effects of interest may vary by one or more of these groups. The effect of eating lunch before a math test may vary between classrooms, or schools or cities (or all of them).

Another example: patients treated by 10 doctors have their weight measured 4 times over the course of a month. We could map out this structure as:

  • Level 1: individual weight measurements
  • Level 2: the patient
  • Level 3: the doctor

Here we have measurements nested within a patient nested within a doctor.

Pause here to try to think of another example multilevel model example.

Motivating data example

Here we motivate the models with the use of some sample data from the Cherry Blossom Running Race. This data comes from the github repo of Modern Data Science with R

library(dplyr)
library(gt)

cherry <- readr::read_csv('data/cherry-blossom-race.csv')

cherry %>% 
  head() %>% 
  gt() %>% 
  tab_header(title = "Cherry Blossom Running Race",
             subtitle = "A sub-sample of outcomes for the annual Cherry Blossom Ten Mile race in Washington, D.C")
Cherry Blossom Running Race
A sub-sample of outcomes for the annual Cherry Blossom Ten Mile race in Washington, D.C
runner age net gun year previous
1 53 83.98333 87.03333 2002 0
1 54 74.30000 74.56667 2003 1
1 55 75.15000 75.26667 2004 2
1 56 74.21667 74.40000 2005 3
1 57 74.25000 74.33333 2006 4
1 58 NA 74.83333 2007 5

The data set contains the following variables:

  • runner: a unique identifier for the runner
  • age: age of the runner
  • net: time to complete the race, from starting line to finish line (minutes)
  • gun: time between the official start of the of race and the finish line (minutes)

The outcome for this data will be the variable net: the time for each runner between the start line and finish line. Let’s explore the data a little.

# How many runners
n_distinct(cherry$runner)
## [1] 36
# How many races for each runner

cherry %>% 
  group_by(runner) %>% 
  summarize(`number of races` = n(),
         `age at first race` = min(age),
         `age at last race` = max(age),
         `mean race time` = round(mean(net, na.rm = T), 2),
         `sd race time` = round(sd(net, na.rm = T), 1),
         `missing values in outcome` = sum(is.na(net))) %>% 
  gt() %>% 
  tab_header(title = "Summary statistics for each runner")
Summary statistics for each runner
runner number of races age at first race age at last race mean race time sd race time missing values in outcome
1 7 53 59 76.38 4.3 2
2 7 52 59 83.97 3.9 2
3 7 51 57 89.94 4.1 1
4 7 52 61 103.05 4.4 2
5 7 51 58 76.45 5.9 1
6 7 54 63 77.40 3.4 3
7 7 51 58 101.62 4.6 2
8 7 52 61 104.37 4.9 3
9 7 54 62 100.79 8.2 2
10 7 52 61 117.61 9.3 2
11 7 52 58 89.45 8.5 2
12 7 51 59 71.77 5.4 2
13 7 53 59 82.22 5.7 2
14 7 53 59 96.71 3.9 0
15 7 52 59 96.06 4.5 2
16 7 52 60 93.90 3.5 2
17 7 50 58 100.06 17.5 2
18 7 50 59 101.60 4.2 3
19 7 52 61 83.15 3.0 3
20 7 50 58 108.03 8.8 2
21 7 52 58 86.10 4.1 1
22 7 50 58 87.94 4.1 2
23 7 51 58 103.95 4.5 1
24 7 54 60 106.15 5.1 1
25 7 51 59 101.48 6.2 2
26 7 50 58 85.17 2.3 2
27 7 50 57 98.48 6.9 1
28 7 50 57 75.21 2.1 2
29 7 54 60 63.06 1.0 1
30 7 53 60 66.55 3.9 1
31 7 52 60 82.55 4.0 2
32 7 50 58 78.66 2.9 2
33 7 54 63 85.64 3.2 3
34 7 52 60 75.82 4.7 2
35 7 52 59 86.97 2.6 1
36 7 51 60 107.49 5.7 3
cherry %>% 
  ggplot(aes(net)) +
  geom_histogram() +
  labs(x = "time to complete the race (minutes)")

cherry %>% 
  ggplot(aes(factor(runner), net)) +
  geom_boxplot() +
  labs(x = 'Runner ID',
       y = "time to complete the race (minutes)") 

There are 36 runners in the data and they each completed 7 races.

We can see that there is a lot of variability within and between runners. For example, runner 17 has a lot of variability in each of their runs, while runner 12 has very little variability. Meanwhile runner 12 runs consistently faster race times than runner 17.

When we plot age against running time, we see a very weak association between age and running times. We can run a linear regression model to attempt to measure this effect.

cherry %>% 
  ggplot(aes(age, net)) +
  geom_point() +
  labs(x = 'age in years',
       y = "time to complete the race (minutes)") +
  geom_smooth(method = "lm")

lm_model <- lm(net ~ age, data = cherry) 

sjPlot::tab_model(lm_model)
  net
Predictors Estimates CI p
(Intercept) 75.06 26.68 – 123.44 0.003
age 0.27 -0.61 – 1.15 0.544
Observations 185
R2 / R2 adjusted 0.002 / -0.003

The age coefficient is small and is not statistically significant ( p = 0.544). Are we to conclude that age has no effect on running times?

Let’s consider the model we just fit. We will plot the fitted regression line for a few sample runners.

`

cherry %>% 
  filter(runner %in% 1:8) %>% 
  ggplot(aes(age, net)) +
  geom_point() +
  facet_wrap(~ runner, nrow=2) +
  geom_abline(intercept = coef(lm_model)[1],
              slope = coef(lm_model)[2]) + 
  labs(x = 'age in years',
       y = "time to complete the race (minutes)") 

Above we selected runners 1 through 8, plotted their age against run times and added the fitted regression line from the linear regression model.

Does this line fit the data well for these 8 runners? For runner 3 perhaps, but I would argue it does not fit very well for the other 7 runners. For most of these runners we see a clear pattern for their running times increase as their age increases. They all have slightly different mean running times as well.

Below we plot a fitted linear regression line for each runner based on their 7 data points as well as the overall regression line.

cherry %>% 
  ggplot(aes(age, net, color = factor(runner))) +
  geom_point() +
  labs(x = 'age in years',
       y = "time to complete the race (minutes)",
       color = "Runner ID") +
  geom_smooth(method = "lm", se=F)+
  geom_abline(intercept = coef(lm_model)[1],
              slope = coef(lm_model)[2], size = 2, color = "blue")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 67 rows containing non-finite values (stat_smooth).
## Warning: Removed 67 rows containing missing values (geom_point).

Complete Pooling

The simple regression model we fit is called the complete pooling model. In this model, we pool all 252 running times (\(y_1, y_2, \ldots y_{252}\)) into one population pool.

We are making the following assumptions:

  • each observation is independent of the others
  • information about the individual runners is irrelevant to our model of running time vs age.

The first assumption is clearly false as we are measuring the same runners several times. Though the observations on one runner might be independent of those on another, the observations within a runner are correlated. The second assumption takes a little more thought, but it is also incorrect.

We’ve seen above that by making these assumptions, this model has been unable to determine that people tend to get slower as they age.

When we are presented with a nested data structure, the complete pooling model will violate the assumption of independence and we will often make misleading conclusions about any relationships we are investigating.

The No Pooling Model

We’ve seen what happens when we completely pool all of our observations. What is the opposite approach? The no pooling model considers each of our 36 runners separately

cherry %>% 
  ggplot(aes(age, net)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  facet_wrap(~ runner) 
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 67 rows containing non-finite values (stat_smooth).
## Warning: Removed 67 rows containing missing values (geom_point).

This model appears to fit the data quite well! The model(s) have picked up on the fact that the effect of age is different between runners and that the mean values vary between runners.

There are problems with this model however.

What if you choose to run in this race next year. How can we predict your race time given your age? You can’t estimate this from the no pooling model!

There is a second issue as well. Let’s look at runner 1 a little more closely.

cherry %>% 
  filter(runner == 1) %>% 
  ggplot(aes(age, net)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, fullrange = TRUE) 
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).

Based on the rest of the data we have, should we assume that runner 1 will continue to see decreased running times?

In the no pooled approach, we are only considering data from runner 1 when making this conclusion. We have data from 35 other runners, and we can make use of this data to know that runners typically slow down with age. Perhaps runner 1 will in fact have a faster race time over the next few years, but not as steep as the no pooled approach would suggest.

Let’s summarize these thoughts. With the no pooled approach

  • We cannot generate predictions for new samples that fall outside the data we have modeled
  • The no pooled approach assumes that information from one runner contains no relevant information for any of the other runners (this is true in general outside the running example)

Multilevel model (partial pooling)

Multilevel models provide a sort of in between of the complete pooling and no pooling approaches, known as partial pooling.

Multilevel models assume that although each group is unique, all groups are connected (having been sampled from some population) and might contain valuable information about one another.

The graph below shows a multilevel data structure in general. For your reference I have included the DiagrammeR code I wrote to display this structure. DiagrammeR is an incredibly useful tool for drawing complex diagrams.

As a short tutorial:

  • you wrap all your code in quote with a call to grViz.
  • create your graph with a digraph statement. I have named my graph multilevel_model
  • use a graph statement to give overall attributes (font, size, etc) to your entire graph
  • use a node statement to create nodes (give attributes inside brackets)
    • after a node statement you can name as many nodes as you want (separated by a semicolon)
    • When you give a node a name (e.g Group1), you can also give it a label inside square brackets. Here I am using html to give subscripts. For example, [label=<Group<SUB>1</SUB>>] will label my node \(Group_1\).
  • add edges between nodes with edge statements. For example Group1->y_11 will draw a line between the node named Group1 and the node names y_11

See the docs for more information.

library(DiagrammeR)
## Warning: package 'DiagrammeR' was built under R version 4.1.2
grViz("
digraph multilevel_model {

  # a 'graph' statement
  graph [overlap = true, fontsize = 10]

  # several 'node' statements
  node [shape = box,
        fontname = Helvetica]
  Population
  
   node [shape = box,
        fontname = Helvetica]
  Group1[label=<Group<SUB>1</SUB>>];
  Group2[label=<Group<SUB>2</SUB>>]; 
  group_dot[label=<...>]; 
  Groupm[label=<Group<SUB>m</SUB>>]
  

  node [shape = circle,
        fixedsize = true,
        width = 0.9] // sets as circles
  y_11[label=<y<SUB>11</SUB>>]; 
  y_21[label=<y<SUB>21</SUB>>]; 
  y_31[label=<y<SUB>31</SUB>>]; 
  
  y_12[label=<y<SUB>12</SUB>>]; 
  y_22[label=<y<SUB>22</SUB>>]; 
  y_32[label=<y<SUB>32</SUB>>];
  y_dot[label=<...>]; 
  
  y_1m[label=<y<SUB>1m</SUB>>]; 
  y_2m[label=<y<SUB>2m</SUB>>]; 
  y_3m[label=<y<SUB>3m</SUB>>]

  # several 'edge' statements
  Population->Group1 Population->Group2 Population ->group_dot Population->Groupm
  Group1->y_11 Group1->y_21 Group1->y_31 
  Group2->y_12 Group2->y_22 Group2->y_32 
  group_dot -> y_dot
  Groupm->y_1m Groupm->y_2m Groupm->y_3m 
}
")

We will go into details shortly, but below I will fit a random intercept model to this data, and then plot the regression lines for the complete pool model, the no pool model and the partial pool model for 4 example runners and then a hypothetical new runner (you).

# first the no pooling model
no_pool_model <- cherry %>% 
  group_by(runner) %>% 
  do(fit_age = broom::tidy(lm(net ~ age, data = .))) %>% 
  unnest(fit_age) %>% 
  select(runner, term, estimate) %>% 
  pivot_wider(names_from = term, values_from = estimate) %>% 
  rename(intercept = `(Intercept)`, slope = age)

# look at the first few observations of this object.

no_pool_model %>% 
  head %>% 
  gt() %>% 
  tab_header(title = "No pooling model",
             subtitle = "First few runner coefficients")
No pooling model
First few runner coefficients
runner intercept slope
1 183.90500 -1.95500000
2 20.43000 1.17666667
3 86.49683 0.06428571
4 40.31261 1.13648649
5 -11.22422 1.62867495
6 15.64622 1.06022222
#####################################
# PARTIAL POOLING RANDOM INTERCEPT MODEL

library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
random_int_model <- lmer(net ~ age + (1 | runner), data = cherry)

random_intercepts <- ranef(random_int_model)
fixed_effects <- fixef(random_int_model)
partial_pooling <- tibble(
  runner = 1:36,
  partial_pool_intercept = fixed_effects[1] + random_intercepts$runner$`(Intercept)`,
  partial_pool_slope = fixed_effects[2]
)

# plot for runners 4 runners

cherry %>% 
  filter(runner %in% c(1, 20, 22, 25)) %>% 
  left_join(no_pool_model, by = "runner") %>% 
   left_join(partial_pooling, by = "runner") %>% 
  ggplot(aes(age, net)) +
  geom_point() +
  geom_abline(intercept = coef(lm_model)[1],
              slope = coef(lm_model)[2], color = "blue") +
  geom_abline(aes(intercept = intercept, slope = slope,
                  color = "No Pool")) +
  geom_abline(aes(intercept = partial_pool_intercept, slope = partial_pool_slope,
                  color = "Partial Pool")) +
  facet_wrap(~ runner) +
  coord_cartesian(ylim = c(70, 120))
## Warning: Removed 8 rows containing missing values (geom_point).

# We can estimate a new runner's time as well

new_runner <- tibble(runner = 37, age = 50:70)
predictions <- predict(random_int_model, new_runner, allow.new.levels = T)
new_runner %>% 
  mutate(predictions = predictions) %>% 
  ggplot(aes(age, predictions)) +
  geom_point() + 
  geom_line() +
  geom_abline(intercept = coef(lm_model)[1],
              slope = coef(lm_model)[2], color = "blue")+
  coord_cartesian(ylim = c(70, 120)) +
  ggtitle("Estimated running time for a new runner")

Notice the use of allow.new.levels = T in the prediction. The blue line is the complete pooling estiate while the black line is the estimate from our multilevel model.

Our partial pooling model doesn’t ignore the population from which our runners were sampled. We can use our complete pooling or partial pooling (hierarchical model) to generate predictions for a new runner. The results are quite different between the two models. The complete pooling model hasnt picked up on the fact that runners get slower as they age, while the multilevel model has. The rate at which you decline is equivalent to the average decline across the 36 runners from the sample.

Random intercept

We often think of multilevel linear models as an extension of the linear model:

\[y_i = \alpha + \beta x_i + \epsilon_i\] where we allow for varying intercepts

\[y_i = \alpha_{j[i]} + \beta x_i + \epsilon_i\] and further extend to allow for varying slopes by group \[y_i = \alpha_{j[i]} + \beta_{j[i]} x_i + \epsilon_i\]

Notation

  • The smallest item of measurement is called a unit and is index as \(i = 1\ldots n\).
  • The outcome measurement \(y = (y_1, \ldots y_n)\) is at the unit level.
  • Predictors are represented by an \(n\times k\) matrix \(X\) so that the vector of predicted values \(\hat{y} = X\beta\).
  • For each unit \(i\), we denote the row vector \(X_i\) so that \(\hat{y_i} = X\beta\) is the prediction for unit \(i\)
  • We labels groups as \(j = 1,\ldots J\). This works for single level grouping (e.g. students in schools, people over time, patients within doctors, etc)
  • If we need a second level of grouping we will use \(k = 1,\ldots, K\).
  • Index variables \(j[i]\) code group membership. If \(j[14]=3\) then the 12the unit in the data \(i=12\) belongs to group 3.

We specify the varying-intercept model as

\[y_i \sim N(\alpha_{j[i]} + \beta x_i, \sigma_y^2)\]

or as

\[y_i = \alpha_{j[i]} + \beta x_i + \epsilon_i\]

With multiple predictors we write the model as:

\[y_i \sim N(X_iB, \sigma_y^2)\]

Where \(B\) is a matrix of coefficients.

  • Standard deviation is \(\sigma_y\) for data level errors, and \(\sigma_{\alpha\) \(\sigma_{\beta}\) for group level errors.
  • Group level predictors are represented by a matrix \(U\) with \(J\) rows. For example, in the group level model \(\alpha_j \sim N(U_j\gamma, \sigma_{\alpha}^2)\). If we have a single group level predictor, we label it as lowercase \(u\).

Complete pooling with no predictors

Remember that multilevel regression can be thought of as a method for compromising between the two extremes of excluding a categorical predictor from a model (complete pooling), or estimating separate models within each level of the categorical predictor (no pooling).

We will use the radon.csv dataset which contains Radon measurements of 919 owner-occupied homes in 85 counties in Minnesota.

The data has the following variables

  • log.radon: Radon measurement (in log pCi/L, i.e., log picoCurie per liter)
  • basement: Indicator for the level of the home at which the radon measurement was taken - 0 = basement, 1 = first floor.
  • uranium: Average county-level soil uranium content.
  • county: county ID
  • count.name: county name as a factor
radon <- readr::read_csv('data/radon.csv')

Let’s plot the complete pooling, no pooling and multilevel model partial pooling for this data. Here we are interested in the log radon measurements per county

# for extracting the standard errors
source('helpers.R')

# the complete pooling estimates
mean_log_radon <- mean(radon$log.radon)

# no pooling estimates
count_means <- radon %>% 
  group_by(county) %>% 
  summarize(mean_radon = mean(log.radon),
            county_var = var(log.radon),
            sample_size = n()) %>% 
  ungroup() %>% 
  mutate(county_sds =  mean(sqrt(county_var[!is.na(county_var)]))/sqrt(sample_size))
            
# plot of mean no pooling and complete pooling estimates
p1 <- count_means %>% 
  ggplot(aes(sample_size , mean_radon)) +
  geom_point() + 
  geom_errorbar(aes(ymin = mean_radon - county_sds, 
                    ymax = mean_radon + county_sds)) +
  geom_hline(yintercept =mean_log_radon, color = 'blue', size = 1) +
  ggtitle("No pooling and complete pooling estimates") +
  coord_cartesian(ylim = c(0, 3.5))

# make a multilevel model to get the partial pooling estiates
random_int_model <- lmer(log.radon ~ 1 + (1| county), data = radon)

# extract the standard errors
se <- random_int_model %>% 
  ranefSE()%>%
  rename(intercept=`(Intercept)`,se=`se.(Intercept)`)

# make the partial pooling plot
p2 <- count_means %>% 
  mutate(model_intercept = coef(random_int_model)$county$`(Intercept)`) %>% 
  bind_cols(se) %>% 
  ggplot(aes(sample_size , model_intercept)) +
  geom_point() + 
  geom_errorbar(aes(ymin = model_intercept - se, 
                    ymax = model_intercept + se)) +
  geom_hline(yintercept =mean_log_radon, color = 'blue', size = 1) +
  ggtitle("Partial pooling estimates")+
  coord_cartesian(ylim = c(0, 3.5))


cowplot::plot_grid(p1, p2)

Complete pooling ignores variation between counties while the no-pooling analysis overstates it. Notice how counties with fewer observations shrink towards the population mean while data with more observations tend to shrink less. This intuitively makes sense. The more data we have, the more certain we are of the estimates.

The goal of estimation is the average log radon level \(\alpha_j\) for each county \(j\). For each county we have a sample size \(n_j\) .

In this simple case we can get the multilevel estimate for a given county \(j\) as a weighted average of the mean of the observations in the county:

\[\alpha_{j}^{\text{multilevel}} \approx \frac{\frac{n_j}{\sigma_y^2}\bar{y}_j + \frac{1}{\sigma_{\alpha}^2}\bar{y}_{all} }{\frac{n_j}{\sigma_y^2} + \frac{1}{\sigma_{\alpha}^2} }\]

Adding predictors

We will add the predictor basement (0 = basement, 1 = first floor). We can generate estimates for our complete pooling, no pooling, and partial pooling as follows.

complete_pool <- lm(log.radon ~ basement, data = radon)

summary(complete_pool)
## 
## Call:
## lm(formula = log.radon ~ basement, data = radon)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6293 -0.5383  0.0342  0.5603  2.5486 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.32674    0.02972  44.640   <2e-16 ***
## basement    -0.61339    0.07284  -8.421   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8226 on 917 degrees of freedom
## Multiple R-squared:  0.07178,    Adjusted R-squared:  0.07077 
## F-statistic: 70.91 on 1 and 917 DF,  p-value: < 2.2e-16
coef(complete_pool)[2]
##   basement 
## -0.6133948
no_pool <- lm(log.radon ~ basement + factor(county) - 1, data = radon)
coef(no_pool)[1]
##  basement 
## -0.720539
partial_pool <- lmer(log.radon ~ basement + (1 | county), data = radon)
fixef(partial_pool)[2]
##   basement 
## -0.6929937

I leave it as an exercise to make a similar plot as for our first model (for a few sampled counties), but including our basement predictor.

In general, we see the same problems here as we did in the first example using the runner data.

Specifying a multilevel model

With this radon data, the simplest multilevel model is specified as:

\[y_i \sim N(\alpha_{j[i]} + \beta x_i, \sigma_y^2), \text{ for } i = 1, \ldots, n\]

This looks similar to the no-pooling model, but the difference is that the \(\alpha_j\) are set to the least squares estimate in the no pooling model.

In the multilevel model we put a soft constraint on the \(\alpha_j\)’s. They are assigned a probability distribution

\[ \alpha_j \sim N(\mu_{\alpha}, \sigma_{\alpha}^2), \text{ for } j = 1, \ldots, J\]

The mean \(\mu_{\alpha}\) and standard deviation \(\sigma_{\alpha}\) get estimated using the data. This constraint has the effect of pulling the estimates \(\alpha_j\) towards the population mean \(\mu_{\alpha}\).

Let’s look at these estimates from our fitted model. We can use VarCorr() to extract the standard deviations from our y values and the intercept. We can extract fixed effect with fixef()

# 
multievel_model <- lmer(log.radon ~ basement + (1 | county), data = radon)

sigmas <- VarCorr(multievel_model)

print(sigmas)
##  Groups   Name        Std.Dev.
##  county   (Intercept) 0.32822 
##  Residual             0.75559
fixed_effects <- fixef(multievel_model)

print(fixed_effects)
## (Intercept)    basement 
##   1.4615979  -0.6929937

So with this model we have,

  • \(\sigma_{\alpha}\) = 0.328
  • \(\mu_{\alpha}\) = 1.46
  • \(\sigma_{\y}\) = 0.756
  • \(\beta\) = -0.693

More on the lmer function

We will be using lmer to fit linear multilevel models and glmer to fit generalized linear multilevel models.

This function works similar to the lm and glm functions we have used in the past. The exception is that when specifying random effects (both intercepts and slopes). Below we fit a model with a random intercept and extract components from the model.

# random intercept model

random_intercept <- lmer(log.radon ~ basement + (1 | county), data = radon)

# model summary
summary(random_intercept)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log.radon ~ basement + (1 | county)
##    Data: radon
## 
## REML criterion at convergence: 2171.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3989 -0.6155  0.0029  0.6405  3.4281 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  county   (Intercept) 0.1077   0.3282  
##  Residual             0.5709   0.7556  
## Number of obs: 919, groups:  county, 85
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.46160    0.05158  28.339
## basement    -0.69299    0.07043  -9.839
## 
## Correlation of Fixed Effects:
##          (Intr)
## basement -0.288
# estimated coefficients

model_coef <- coef(random_intercept)
model_coef$county %>% head()
##   (Intercept)   basement
## 1   1.1915003 -0.6929937
## 2   0.9276468 -0.6929937
## 3   1.4792143 -0.6929937
## 4   1.5045012 -0.6929937
## 5   1.4461503 -0.6929937
## 6   1.4801817 -0.6929937
# fixed effects

fixef(random_intercept)
## (Intercept)    basement 
##   1.4615979  -0.6929937
# random effects 
random_effects <- ranef(random_intercept)
random_effects$county %>% head()
##   (Intercept)
## 1 -0.27009754
## 2 -0.53395109
## 3  0.01761646
## 4  0.04290332
## 5 -0.01544759
## 6  0.01858386
# standard errors fixed effects
se.fixef(random_intercept)
## (Intercept)    basement 
##  0.05157623  0.07043081
# standard errors random effects
random_effects <- se.ranef(random_intercept)
random_effects$county %>% head()
##   (Intercept)
## 1  0.24777409
## 2  0.09981833
## 3  0.26227671
## 4  0.21544775
## 5  0.24777409
## 6  0.26227671

Good news is that other packages we have used like SjPlot play nicely with lmer objects

sjPlot::tab_model(random_intercept)
## Registered S3 methods overwritten by 'parameters':
##   method                           from      
##   as.double.parameters_kurtosis    datawizard
##   as.double.parameters_skewness    datawizard
##   as.double.parameters_smoothness  datawizard
##   as.numeric.parameters_kurtosis   datawizard
##   as.numeric.parameters_skewness   datawizard
##   as.numeric.parameters_smoothness datawizard
##   print.parameters_distribution    datawizard
##   print.parameters_kurtosis        datawizard
##   print.parameters_skewness        datawizard
##   summary.parameters_kurtosis      datawizard
##   summary.parameters_skewness      datawizard
  log radon
Predictors Estimates CI p
(Intercept) 1.46 1.36 – 1.56 <0.001
basement -0.69 -0.83 – -0.55 <0.001
Random Effects
σ2 0.57
τ00 county 0.11
ICC 0.16
N county 85
Observations 919
Marginal R2 / Conditional R2 0.090 / 0.234

Random slope model

The next step in multilevel modeling is to allow more than one regression coefficient to vary by group.

We can extend the model we have been working with so far as follows:

\[y_i \sim N(\alpha_{j[i]} + \beta_{j[i]} x_i, \sigma_y^2)\]

\[ \begin{pmatrix} \alpha_j \\ \beta_j \end{pmatrix} = N\Big( \begin{pmatrix} \mu_\alpha \\ \mu_\beta \end{pmatrix}, \begin{pmatrix} \sigma_{\alpha}^2 & \rho\sigma_{\alpha}\sigma_{\beta} \\ \sigma_{\alpha}^2 & \sigma_{\beta}^2 \end{pmatrix}\Big), \text{ for } j = 1, \ldots, J \]

random_slope <- lmer(log.radon ~ basement + (1 + basement |county), data = radon)

summary(random_slope)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log.radon ~ basement + (1 + basement | county)
##    Data: radon
## 
## REML criterion at convergence: 2168.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4044 -0.6224  0.0138  0.6123  3.5682 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  county   (Intercept) 0.1216   0.3487        
##           basement    0.1181   0.3436   -0.34
##  Residual             0.5567   0.7462        
## Number of obs: 919, groups:  county, 85
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.46277    0.05387  27.155
## basement    -0.68110    0.08758  -7.777
## 
## Correlation of Fixed Effects:
##          (Intr)
## basement -0.381
model_coef <- coef(random_slope)
model_coef$county %>% head()
##   (Intercept)   basement
## 1   1.1445374 -0.5406207
## 2   0.9333795 -0.7708213
## 3   1.4716909 -0.6688831
## 4   1.5354352 -0.7525548
## 5   1.4270378 -0.6206707
## 6   1.4826560 -0.6877094
# fixed effects

fixef(random_slope)
## (Intercept)    basement 
##   1.4627700  -0.6810982
# random effects 
random_effects <- ranef(random_slope)
random_effects$county %>% head()
##    (Intercept)     basement
## 1 -0.318232574  0.140477472
## 2 -0.529390508 -0.089723141
## 3  0.008920884  0.012215066
## 4  0.072665215 -0.071456546
## 5 -0.035732220  0.060427482
## 6  0.019886022 -0.006611238
# standard errors fixed effects
se.fixef(random_slope)
## (Intercept)    basement 
##  0.05386725  0.08758298
# standard errors random effects
random_effects <- se.ranef(random_slope)
random_effects$county %>% head()
##   (Intercept)  basement
## 1   0.2645615 0.3185976
## 2   0.1011331 0.2650752
## 3   0.2990128 0.3158159
## 4   0.2544841 0.2907492
## 5   0.2645615 0.3185976
## 6   0.2710295 0.3357723

We often want to plot these random intercepts for each level of our grouping variable. The sjPlot package comes with a handy function for this

# random effects plot
sjPlot::plot_model(random_slope, type = "re")
## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.4.0
## Current Matrix version is 1.3.4
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package

Adding group level predictors

One of the powerful extensions of multilevel models is the ability to add predictors to the \(\mu_\alpha\) term(s). Here we briefly describe this model and show how to fit it. We will get into more details in the next lecture.

This model is specified as

\[ \begin{pmatrix} \alpha_j \\ \beta_j \end{pmatrix} = N\Big( \begin{pmatrix} \gamma_0^\alpha + \gamma_1^\alpha\mu_j \\ \gamma_0^\beta + \gamma_1^\beta\mu_j \end{pmatrix}, \begin{pmatrix} \sigma_{\alpha}^2 & \rho\sigma_{\alpha}\sigma_{\beta} \\ \sigma_{\alpha}^2 & \sigma_{\beta}^2 \end{pmatrix}\Big), \text{ for } j = 1, \ldots, J \]

Here we will add in the county level soil uranium measurements:

random_slope_full <- lmer(log.radon ~ basement + uranium +  (1  |county),
                         data = radon)

summary(random_slope_full)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log.radon ~ basement + uranium + (1 | county)
##    Data: radon
## 
## REML criterion at convergence: 2134.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.9673 -0.6117  0.0274  0.6555  3.3848 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  county   (Intercept) 0.02446  0.1564  
##  Residual             0.57523  0.7584  
## Number of obs: 919, groups:  county, 85
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.46576    0.03794  38.633
## basement    -0.66824    0.06880  -9.713
## uranium      0.72027    0.09176   7.849
## 
## Correlation of Fixed Effects:
##          (Intr) basmnt
## basement -0.357       
## uranium   0.145 -0.009
model_coef <- coef(random_slope_full)
model_coef$county %>% head()
##   (Intercept)   basement   uranium
## 1    1.445120 -0.6682448 0.7202676
## 2    1.477009 -0.6682448 0.7202676
## 3    1.478185 -0.6682448 0.7202676
## 4    1.576891 -0.6682448 0.7202676
## 5    1.473999 -0.6682448 0.7202676
## 6    1.439566 -0.6682448 0.7202676
# fixed effects

fixef(random_slope_full)
## (Intercept)    basement     uranium 
##   1.4657628  -0.6682448   0.7202676
# random effects 
random_effects <- ranef(random_slope_full)
random_effects$county %>% head()
##    (Intercept)
## 1 -0.020642363
## 2  0.011245769
## 3  0.012422028
## 4  0.111128434
## 5  0.008235846
## 6 -0.026196357
# standard errors fixed effects
se.fixef(random_slope_full)
## (Intercept)    basement     uranium 
##  0.03794064  0.06880055  0.09176259
# standard errors random effects
random_effects <- se.ranef(random_slope_full)
random_effects$county %>% head()
##   (Intercept)
## 1  0.14458779
## 2  0.08727756
## 3  0.14728903
## 4  0.13729671
## 5  0.14458779
## 6  0.14728903

In this model, \(\gamma_0^\alpha\), \(\gamma_0^\beta\), \(\gamma_1^\alpha\), are given by:

  • intercept
  • basement
  • uranium

Below we plot the alpha estimates plus or minus se against the uranium measurements.This shows how uranium is predictive of the intercept for each county!

# Get the county level uranium measurements
uranium <- radon %>% 
  group_by(county) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(county, uranium)

# get the fixed effect
fixed_effects <- fixef(random_slope_full)


# get the estimated alpha 
alpha_hat<- fixef(random_slope_full)[1] +
  fixef(random_slope_full)[3]*uranium$uranium + 
  as.vector(ranef(random_slope_full)$county) %>% 
  pull(`(Intercept)`)

# get the standard errors of the alpha
alpha_se <- se.ranef(random_slope_full)$county %>% 
  as.data.frame() %>% 
  pull( `(Intercept)`)

# make a data frame with these values
random_effects <- tibble(county = 1:85, 
                         uranium = uranium$uranium,
                         alpha_hat = alpha_hat,
                         alpha_se = alpha_se)

#plot the estimated alpha against the uranium measurements

random_effects %>% 
  ggplot(aes(uranium, alpha_hat)) +
  geom_point() +
  geom_errorbar(aes(ymin= alpha_hat -alpha_se,
                    ymax = alpha_hat + alpha_se)) +
  geom_abline(intercept = fixed_effects[1], slope = fixed_effects[3])